graph-sparse lda
Graph-Sparse LDA: A Topic Model with Structured Sparsity
Doshi-Velez, Finale (Harvard University) | Wallace, Byron C. (University of Texas at Austin) | Adams, Ryan (Harvard University)
Topic modeling is a powerful tool for uncovering latent structure in many domains, including medicine, finance, and vision. The goals for the model vary depending on the application: sometimes the discovered topics are used for prediction or another downstream task. In other cases, the content of the topic may be of intrinsic scientific interest. Unfortunately, even when one uses modern sparse techniques, discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that uses knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.
- North America > United States > Texas > Travis County > Austin (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Graph-Sparse LDA: A Topic Model with Structured Sparsity
Doshi-Velez, Finale, Wallace, Byron, Adams, Ryan
Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.49)